Faucet: streaming de novo assembly graph construction
Identifieur interne : 000D84 ( Main/Exploration ); précédent : 000D83; suivant : 000D85Faucet: streaming de novo assembly graph construction
Auteurs : Roye Rozov [Israël] ; Gil Goldshlager [États-Unis] ; Eran Halperin [États-Unis] ; Ron Shamir [Israël]Source :
- Bioinformatics [ 1367-4803 ] ; 2017.
Descripteurs français
- KwdFr :
- MESH :
- génétique : Microbiote.
- Algorithmes, Analyse de séquence d'ADN, Génomique, Humains, Logiciel, Métagénome.
English descriptors
- KwdEn :
- MESH :
- genetics : Microbiota.
- methods : Genomics, Sequence Analysis, DNA.
- Algorithms, Humans, Metagenome, Software.
Abstract
We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.
Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata—coverage counts collected at junction k-mers and connections bridging between junction pairs—contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency—namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14–110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available.
Faucet is available at
Url:
DOI: 10.1093/bioinformatics/btx471
PubMed: 29036597
PubMed Central: 5870852
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000B20
- to stream Pmc, to step Curation: 000B20
- to stream Pmc, to step Checkpoint: 000854
- to stream PubMed, to step Corpus: 000B15
- to stream PubMed, to step Curation: 000B15
- to stream PubMed, to step Checkpoint: 000914
- to stream Ncbi, to step Merge: 001C18
- to stream Ncbi, to step Curation: 001C18
- to stream Ncbi, to step Checkpoint: 001C18
- to stream Main, to step Merge: 000D87
- to stream Main, to step Curation: 000D84
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Faucet: streaming <italic>de novo</italic>
assembly graph construction</title>
<author><name sortKey="Rozov, Roye" sort="Rozov, Roye" uniqKey="Rozov R" first="Roye" last="Rozov">Roye Rozov</name>
<affiliation wicri:level="1"><nlm:aff id="btx471-aff1">Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Goldshlager, Gil" sort="Goldshlager, Gil" uniqKey="Goldshlager G" first="Gil" last="Goldshlager">Gil Goldshlager</name>
<affiliation wicri:level="2"><nlm:aff id="btx471-aff2">Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Halperin, Eran" sort="Halperin, Eran" uniqKey="Halperin E" first="Eran" last="Halperin">Eran Halperin</name>
<affiliation wicri:level="2"><nlm:aff id="btx471-aff3">Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
<affiliation wicri:level="1"><nlm:aff id="btx471-aff1">Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">29036597</idno>
<idno type="pmc">5870852</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870852</idno>
<idno type="RBID">PMC:5870852</idno>
<idno type="doi">10.1093/bioinformatics/btx471</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000B20</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B20</idno>
<idno type="wicri:Area/Pmc/Curation">000B20</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B20</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000854</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000854</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:29036597</idno>
<idno type="wicri:Area/PubMed/Corpus">000B15</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000B15</idno>
<idno type="wicri:Area/PubMed/Curation">000B15</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000B15</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000914</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000914</idno>
<idno type="wicri:Area/Ncbi/Merge">001C18</idno>
<idno type="wicri:Area/Ncbi/Curation">001C18</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001C18</idno>
<idno type="wicri:Area/Main/Merge">000D87</idno>
<idno type="wicri:Area/Main/Curation">000D84</idno>
<idno type="wicri:Area/Main/Exploration">000D84</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Faucet: streaming <italic>de novo</italic>
assembly graph construction</title>
<author><name sortKey="Rozov, Roye" sort="Rozov, Roye" uniqKey="Rozov R" first="Roye" last="Rozov">Roye Rozov</name>
<affiliation wicri:level="1"><nlm:aff id="btx471-aff1">Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Goldshlager, Gil" sort="Goldshlager, Gil" uniqKey="Goldshlager G" first="Gil" last="Goldshlager">Gil Goldshlager</name>
<affiliation wicri:level="2"><nlm:aff id="btx471-aff2">Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA</wicri:regionArea>
<placeName><region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Halperin, Eran" sort="Halperin, Eran" uniqKey="Halperin E" first="Eran" last="Halperin">Eran Halperin</name>
<affiliation wicri:level="2"><nlm:aff id="btx471-aff3">Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA</wicri:regionArea>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
<affiliation wicri:level="1"><nlm:aff id="btx471-aff1">Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint><date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Genomics (methods)</term>
<term>Humans</term>
<term>Metagenome</term>
<term>Microbiota (genetics)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Génomique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Microbiote (génétique)</term>
<term>Métagénome</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Microbiota</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Microbiote</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Genomics</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Humans</term>
<term>Metagenome</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Génomique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Métagénome</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><title>Abstract</title>
<sec id="s1"><title>Motivation</title>
<p>We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.</p>
</sec>
<sec id="s2"><title>Results</title>
<p>Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata—coverage counts collected at junction k-mers and connections bridging between junction pairs—contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency—namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14–110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available.</p>
</sec>
<sec id="s3"><title>Availability and implementation</title>
<p>Faucet is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/Shamir-Lab/Faucet">https://github.com/Shamir-Lab/Faucet</ext-link>
</p>
</sec>
<sec id="s5"><title><xref ref-type="supplementary-material" rid="sup1">Supplementary information</xref>
</title>
<p><xref ref-type="supplementary-material" rid="sup1">Supplementary data</xref>
are available at <italic>Bioinformatics</italic>
online.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Bankevich, A" uniqKey="Bankevich A">A. Bankevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bloom, B H" uniqKey="Bloom B">B.H. Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
<author><name sortKey="Rizk, G" uniqKey="Rizk G">G. Rizk</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="El Metwally, S" uniqKey="El Metwally S">S. El-Metwally</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gurevich, A" uniqKey="Gurevich A">A. Gurevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Iqbal, Z" uniqKey="Iqbal Z">Z. Iqbal</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, D" uniqKey="Li D">D. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Medvedev, P" uniqKey="Medvedev P">P. Medvedev</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Melsted, P" uniqKey="Melsted P">P. Melsted</name>
</author>
<author><name sortKey="Halldorsson, B V" uniqKey="Halldorsson B">B.V. Halldorsson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Minkin, I" uniqKey="Minkin I">I. Minkin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mohamadi, H" uniqKey="Mohamadi H">H. Mohamadi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nihalani, R" uniqKey="Nihalani R">R. Nihalani</name>
</author>
<author><name sortKey="Aluru, S" uniqKey="Aluru S">S. Aluru</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Novak, A M" uniqKey="Novak A">A.M. Novak</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Nurk, S" uniqKey="Nurk S">S. Nurk</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pell, J" uniqKey="Pell J">J. Pell</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pertea, M" uniqKey="Pertea M">M. Pertea</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pevzner, P A" uniqKey="Pevzner P">P.A. Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Prjibelski, A D" uniqKey="Prjibelski A">A.D. Prjibelski</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Roberts, A" uniqKey="Roberts A">A. Roberts</name>
</author>
<author><name sortKey="Pachter, L" uniqKey="Pachter L">L. Pachter</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Rozov, R" uniqKey="Rozov R">R. Rozov</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shi, W" uniqKey="Shi W">W. Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Simpson, J T" uniqKey="Simpson J">J.T. Simpson</name>
</author>
<author><name sortKey="Durbin, R" uniqKey="Durbin R">R. Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Song, L" uniqKey="Song L">L. Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ye, C" uniqKey="Ye C">C. Ye</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, Q" uniqKey="Zhang Q">Q. Zhang</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>Israël</li>
<li>États-Unis</li>
</country>
<region><li>Californie</li>
<li>Massachusetts</li>
</region>
</list>
<tree><country name="Israël"><noRegion><name sortKey="Rozov, Roye" sort="Rozov, Roye" uniqKey="Rozov R" first="Roye" last="Rozov">Roye Rozov</name>
</noRegion>
<name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
</country>
<country name="États-Unis"><region name="Massachusetts"><name sortKey="Goldshlager, Gil" sort="Goldshlager, Gil" uniqKey="Goldshlager G" first="Gil" last="Goldshlager">Gil Goldshlager</name>
</region>
<name sortKey="Halperin, Eran" sort="Halperin, Eran" uniqKey="Halperin E" first="Eran" last="Halperin">Eran Halperin</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D84 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000D84 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= PMC:5870852 |texte= Faucet: streaming de novo assembly graph construction }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:29036597" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |